21 research outputs found
Swisslink: high-precision, context-free entity linking exploiting unambiguous labels
Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators
A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality
Microtask crowdsourcing is increasingly critical to the creation of extremely
large datasets. As a result, crowd workers spend weeks or months repeating the
exact same tasks, making it necessary to understand their behavior over these
long periods of time. We utilize three large, longitudinal datasets of nine
million annotations collected from Amazon Mechanical Turk to examine claims
that workers fatigue or satisfice over these long periods, producing lower
quality work. We find that, contrary to these claims, workers are extremely
stable in their quality over the entire period. To understand whether workers
set their quality based on the task's requirements for acceptance, we then
perform an experiment where we vary the required quality for a large
crowdsourcing task. Workers did not adjust their quality based on the
acceptance threshold: workers who were above the threshold continued working at
their usual quality level, and workers below the threshold self-selected
themselves out of the task. Capitalizing on this consistency, we demonstrate
that it is possible to predict workers' long-term quality using just a glimpse
of their quality on the first five tasks.Comment: 10 pages, 11 figures, accepted CSCW 201
Hippocampus: answering memory queries using transactive search
Memory queries denote queries where the user is trying to recall from his/her past personal experiences. Neither Web search nor structured queries can effectively answer this type of queries, even when supported by Human Computation so- lutions. In this paper, we propose a new approach to answer memory queries that we call Transactive Search: The user- requested memory is reconstructed from a group of people by exchanging pieces of personal memories in order to reassem- ble the overall memory, which is stored in a distributed fash- ion among members of the group. We experimentally com- pare our proposed approach against a set of advanced search techniques including the use of Machine Learning methods over the Web of Data, online Social Networks, and Human Computation techniques. Experimental results show that Transactive Search significantly outperforms the effective- ness of existing search approaches for memory queries
Recommended from our members
Crowdsourcing in China: Exploring the Work Experience of Solo Crowdworkers and Crowdfarm Workers
Recent research highlights the potential of crowdsourcing in China. Yet very few studies explore the workplace context and experiences of Chinese crowdworkers. Those that do, focus mainly on the work experiences of solo crowdworkers but do not deal with issues pertaining to the substantial amount of people working in ‘crowdfarms’. This article addresses this gap as one of its primary concerns. Drawing on a study that involves 48 participants, our research explores, compares and contrasts the work experiences of solo crowdworkers to those of crowdfarm workers. Our findings illustrate that the work experiences and context of the solo workers and crowdfarm workers are substantially different, with regards to their motivations, the ways they engage with crowdsourcing, the tasks they work on, and the crowdsourcing platforms they utilize. Overall, our study contributes to furthering the understandings on the work experiences of crowdworkers in China
SectionLinks: Mapping Orphan Wikidata Entities onto Wikipedia Sections
Wikidata is a key resource for the provisioning of structured data on several Wikimedia projects, including Wikipedia. By design, all Wikipedia articles are linked to Wikidata entities; such mappings represent a substantial source of both semantic and structural information. However, only a small subgraph of Wikidata is mapped in that way – – only about 10% of the sitelinks are linked to English Wikipedia, for example. In this paper, we describe a resource we have built and published to extend this subgraph and add more links between Wikidata and Wikipedia. We start from the assumption that a number of Wikidata entities can be mapped onto Wikipedia sections, in addition to Wikipedia articles. The resource we put forward contains tens of thousands of such mappings, hence considerably enriching the highly structured Wikidata graph with encyclopedic knowledge from Wikipedia
Pick-a-crowd: tell me what you like, and I'll tell you what to do: a crowdsourcing platform for personalized human intelligence task assignment based on social networks
Crowdsourcing allows to build hybrid online platforms that combine scalable information systems with the power of human intelligence to complete tasks that are difficult to tackle for current algorithms. Examples include hybrid database systems that use the crowd to fill missing values or to sort items according to subjective dimensions such as picture attractiveness. Current approaches to Crowdsourcing adopt a pull methodology where tasks are published on specialized Web platforms where workers can pick their preferred tasks on a first-come-first-served basis. While this approach has many advantages, such as simplicity and short completion times, it does not guarantee that the task is performed by the most suitable worker. In this paper, we propose and extensively evaluate a different Crowdsourcing approach based on a push methodology. Our proposed system carefully selects which workers should perform a given task based on worker profiles extracted from social networks. Workers and tasks are automatically matched using an underlying categorization structure that exploits entities extracted from the task descriptions on one hand, and categories liked by the user on social platforms on the other hand. We experimentally evaluate our approach on tasks of varying complexity and show that our push methodology consistently yield better results than usual pull strategies